Overview

Dataset statistics

Number of variables9
Number of observations102816
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory80.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-11-27 20:37:07.259174
Analysis finished2023-11-27 20:37:12.741212
Duration5.48 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-11-27T15:37:12.866496image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520693
Coefficient of variation (CV)0.0017127608
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.0722565 × 108
Variance11.916783
MonotonicityIncreasing
2023-11-27T15:37:13.100625image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
Other values (2) 17136
16.7%
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
ValueCountFrequency (%)
2021 8568
8.3%
2020 8568
8.3%
2019 8568
8.3%
2018 8568
8.3%
2017 8568
8.3%
2016 8568
8.3%
2015 8568
8.3%
2014 8568
8.3%
2013 8568
8.3%
2012 8568
8.3%

DGUID
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
2016A000011124
7560 
2016A000212
7560 
2016A000213
7560 
2016A000224
7560 
2016A000235
7560 
Other values (9)
65016 

Length

Max length14
Median length11
Mean length11.220588
Min length11

Characters and Unicode

Total characters1153656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.4%
2016A000212 7560
 
7.4%
2016A000213 7560
 
7.4%
2016A000224 7560
 
7.4%
2016A000235 7560
 
7.4%
2016A000246 7560
 
7.4%
2016A000247 7560
 
7.4%
2016A000248 7560
 
7.4%
2016A000259 7560
 
7.4%
2016A000210 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-11-27T15:37:13.379499image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.4%
2016a000212 7560
 
7.4%
2016a000213 7560
 
7.4%
2016a000224 7560
 
7.4%
2016a000235 7560
 
7.4%
2016a000246 7560
 
7.4%
2016a000247 7560
 
7.4%
2016a000248 7560
 
7.4%
2016a000259 7560
 
7.4%
2016a000210 7224
 
7.0%
Other values (4) 27552
26.8%

Most occurring characters

ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1050840
91.1%
Uppercase Letter 102816
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 102816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1050840
91.1%
Latin 102816
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 102816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1153656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

GEO
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Canada
7560 
Nova Scotia
7560 
New Brunswick
7560 
Quebec
7560 
Ontario
7560 
Other values (9)
65016 

Length

Max length25
Median length16
Mean length11.72549
Min length5

Characters and Unicode

Total characters1205568
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.4%
Nova Scotia 7560
 
7.4%
New Brunswick 7560
 
7.4%
Quebec 7560
 
7.4%
Ontario 7560
 
7.4%
Manitoba 7560
 
7.4%
Saskatchewan 7560
 
7.4%
Alberta 7560
 
7.4%
British Columbia 7560
 
7.4%
Newfoundland and Labrador 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-11-27T15:37:13.733825image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.7%
manitoba 7560
 
4.7%
nova 7560
 
4.7%
british 7560
 
4.7%
alberta 7560
 
4.7%
saskatchewan 7560
 
4.7%
columbia 7560
 
4.7%
ontario 7560
 
4.7%
quebec 7560
 
4.7%
brunswick 7560
 
4.7%
Other values (12) 86016
53.2%

Most occurring characters

ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 992376
82.3%
Uppercase Letter 154392
 
12.8%
Space Separator 58800
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 148176
14.9%
r 88032
 
8.9%
n 87024
 
8.8%
i 74592
 
7.5%
e 73920
 
7.4%
t 73584
 
7.4%
o 73248
 
7.4%
d 58128
 
5.9%
u 49560
 
5.0%
w 44352
 
4.5%
Other values (9) 221760
22.3%
Uppercase Letter
ValueCountFrequency (%)
N 36120
23.4%
C 15120
9.8%
B 15120
9.8%
S 15120
9.8%
Q 7560
 
4.9%
O 7560
 
4.9%
M 7560
 
4.9%
A 7560
 
4.9%
L 7224
 
4.7%
P 7224
 
4.7%
Other values (4) 28224
18.3%
Space Separator
ValueCountFrequency (%)
58800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1146768
95.1%
Common 58800
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 148176
 
12.9%
r 88032
 
7.7%
n 87024
 
7.6%
i 74592
 
6.5%
e 73920
 
6.4%
t 73584
 
6.4%
o 73248
 
6.4%
d 58128
 
5.1%
u 49560
 
4.3%
w 44352
 
3.9%
Other values (23) 376152
32.8%
Common
ValueCountFrequency (%)
58800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1205568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Sector
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Government non-profit institutions
21168 
Non-profit institutions serving households (community organizations)
19656 
Business non-profit institutions
19656 

Length

Max length68
Median length34
Mean length42.588235
Min length29

Characters and Unicode

Total characters4378752
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.6%
Total non-profit institutions excluding governments 21168
20.6%
Government non-profit institutions 21168
20.6%
Non-profit institutions serving households (community organizations) 19656
19.1%
Business non-profit institutions 19656
19.1%

Length

2023-11-27T15:37:14.040878image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:14.290583image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 102816
25.1%
institutions 102816
25.1%
total 42336
10.3%
excluding 21168
 
5.2%
governments 21168
 
5.2%
government 21168
 
5.2%
serving 19656
 
4.8%
households 19656
 
4.8%
community 19656
 
4.8%
organizations 19656
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3826872
87.4%
Space Separator 306936
 
7.0%
Dash Punctuation 102816
 
2.3%
Uppercase Letter 102816
 
2.3%
Open Punctuation 19656
 
0.4%
Close Punctuation 19656
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 595728
15.6%
t 535248
14.0%
i 530712
13.9%
o 491400
12.8%
s 364392
9.5%
r 184464
 
4.8%
u 182952
 
4.8%
e 164808
 
4.3%
f 102816
 
2.7%
p 102816
 
2.7%
Other values (11) 571536
14.9%
Uppercase Letter
ValueCountFrequency (%)
T 42336
41.2%
G 21168
20.6%
N 19656
19.1%
B 19656
19.1%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19656
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3929688
89.7%
Common 449064
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 595728
15.2%
t 535248
13.6%
i 530712
13.5%
o 491400
12.5%
s 364392
9.3%
r 184464
 
4.7%
u 182952
 
4.7%
e 164808
 
4.2%
f 102816
 
2.6%
p 102816
 
2.6%
Other values (15) 674352
17.2%
Common
ValueCountFrequency (%)
306936
68.4%
- 102816
 
22.9%
( 19656
 
4.4%
) 19656
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4378752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Characteristics
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Male employees
 
5880
Not a visible minority
 
5880
55 to 64 years
 
5880
Female employees
 
5880
High school diploma and less
 
5880
Other values (13)
73416 

Length

Max length33
Median length28
Mean length19.415033
Min length14

Characters and Unicode

Total characters1996176
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.7%
Not a visible minority 5880
 
5.7%
55 to 64 years 5880
 
5.7%
Female employees 5880
 
5.7%
High school diploma and less 5880
 
5.7%
College diploma 5880
 
5.7%
Visible minority 5880
 
5.7%
25 to 34 years 5880
 
5.7%
35 to 44 years 5880
 
5.7%
45 to 54 years 5880
 
5.7%
Other values (8) 44016
42.8%

Length

2023-11-27T15:37:14.638596image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 34272
 
10.4%
employees 33936
 
10.2%
to 28896
 
8.7%
and 16800
 
5.1%
visible 11760
 
3.6%
minority 11760
 
3.6%
diploma 11760
 
3.6%
identity 11088
 
3.3%
male 5880
 
1.8%
college 5880
 
1.8%
Other values (28) 159096
48.0%

Most occurring characters

ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1561896
78.2%
Space Separator 228312
 
11.4%
Decimal Number 126336
 
6.3%
Uppercase Letter 68544
 
3.4%
Dash Punctuation 11088
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255024
16.3%
i 147840
9.5%
o 142800
9.1%
s 114240
 
7.3%
a 102648
 
6.6%
l 98112
 
6.3%
y 96600
 
6.2%
t 96432
 
6.2%
r 90216
 
5.8%
n 89544
 
5.7%
Other values (10) 328440
21.0%
Uppercase Letter
ValueCountFrequency (%)
N 16968
24.8%
I 11088
16.2%
M 5880
 
8.6%
V 5880
 
8.6%
C 5880
 
8.6%
H 5880
 
8.6%
F 5880
 
8.6%
U 5544
 
8.1%
T 5544
 
8.1%
Decimal Number
ValueCountFrequency (%)
5 46032
36.4%
4 40656
32.2%
3 11760
 
9.3%
2 11256
 
8.9%
6 11256
 
8.9%
1 5376
 
4.3%
Space Separator
ValueCountFrequency (%)
228312
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1630440
81.7%
Common 365736
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255024
15.6%
i 147840
 
9.1%
o 142800
 
8.8%
s 114240
 
7.0%
a 102648
 
6.3%
l 98112
 
6.0%
y 96600
 
5.9%
t 96432
 
5.9%
r 90216
 
5.5%
n 89544
 
5.5%
Other values (19) 396984
24.3%
Common
ValueCountFrequency (%)
228312
62.4%
5 46032
 
12.6%
4 40656
 
11.1%
3 11760
 
3.2%
2 11256
 
3.1%
6 11256
 
3.1%
- 11088
 
3.0%
1 5376
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1996176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Number of jobs
14688 
Hours worked
14688 
Wages and salaries
14688 
Average annual hours worked
14688 
Average weekly hours worked
14688 
Other values (2)
29376 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2203200
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 14688
14.3%
Hours worked 14688
14.3%
Wages and salaries 14688
14.3%
Average annual hours worked 14688
14.3%
Average weekly hours worked 14688
14.3%
Average annual wages and salaries 14688
14.3%
Average hourly wage 14688
14.3%

Length

2023-11-27T15:37:14.916129image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:15.207497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 58752
16.7%
hours 44064
12.5%
worked 44064
12.5%
wages 29376
8.3%
and 29376
8.3%
salaries 29376
8.3%
annual 29376
8.3%
number 14688
 
4.2%
of 14688
 
4.2%
jobs 14688
 
4.2%
Other values (3) 44064
12.5%

Most occurring characters

ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1850688
84.0%
Space Separator 249696
 
11.3%
Uppercase Letter 102816
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279072
15.1%
a 249696
13.5%
r 205632
11.1%
s 146880
 
7.9%
o 132192
 
7.1%
u 102816
 
5.6%
g 102816
 
5.6%
n 88128
 
4.8%
l 88128
 
4.8%
w 88128
 
4.8%
Other values (10) 367200
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 58752
57.1%
W 14688
 
14.3%
H 14688
 
14.3%
N 14688
 
14.3%
Space Separator
ValueCountFrequency (%)
249696
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1953504
88.7%
Common 249696
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279072
14.3%
a 249696
12.8%
r 205632
10.5%
s 146880
 
7.5%
o 132192
 
6.8%
u 102816
 
5.3%
g 102816
 
5.3%
n 88128
 
4.5%
l 88128
 
4.5%
w 88128
 
4.5%
Other values (14) 470016
24.1%
Common
ValueCountFrequency (%)
249696
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2203200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Hours
44064 
Dollars
44064 
Jobs
14688 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters587520
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 44064
42.9%
Dollars 44064
42.9%
Jobs 14688
 
14.3%

Length

2023-11-27T15:37:15.571501image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:15.806460image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 44064
42.9%
dollars 44064
42.9%
jobs 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484704
82.5%
Uppercase Letter 102816
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 102816
21.2%
s 102816
21.2%
r 88128
18.2%
l 88128
18.2%
u 44064
9.1%
a 44064
9.1%
b 14688
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 44064
42.9%
D 44064
42.9%
J 14688
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 587520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 587520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
units
73440 
thousands
14688 
millions
14688 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters616896
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Length

2023-11-27T15:37:16.091203image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-27T15:37:16.330444image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616896
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 616896
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 616896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

VALUE
Real number (ℝ)

Distinct35398
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-11-27T15:37:16.601226image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-11-27T15:37:16.986442image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
2.0%
31 1949
 
1.9%
32 1844
 
1.8%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
1.0%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
87.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.2%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-11-27T15:37:11.427506image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:37:10.959347image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:37:11.678040image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-27T15:37:11.166531image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-27T15:37:17.446190image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0290.0230.0000.0000.000
GEO0.0000.0901.0001.0000.0290.0230.0000.0000.000
Sector0.0000.0510.0290.0291.0000.0180.0000.0000.000
Characteristics0.0000.0440.0230.0230.0181.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-11-27T15:37:12.029536image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-27T15:37:12.466409image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columned Sorted and NA Removed

Overview

Dataset statistics

Number of variables9
Number of observations102816
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory80.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-11-28 20:52:09.458864
Analysis finished2023-11-28 20:52:14.255360
Duration4.8 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-11-28T15:52:14.374772image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520693
Coefficient of variation (CV)0.0017127608
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.0722565 × 108
Variance11.916783
MonotonicityIncreasing
2023-11-28T15:52:14.592268image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
Other values (2) 17136
16.7%
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
ValueCountFrequency (%)
2021 8568
8.3%
2020 8568
8.3%
2019 8568
8.3%
2018 8568
8.3%
2017 8568
8.3%
2016 8568
8.3%
2015 8568
8.3%
2014 8568
8.3%
2013 8568
8.3%
2012 8568
8.3%

DGUID
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
2016A000011124
7560 
2016A000212
7560 
2016A000213
7560 
2016A000224
7560 
2016A000235
7560 
Other values (9)
65016 

Length

Max length14
Median length11
Mean length11.220588
Min length11

Characters and Unicode

Total characters1153656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.4%
2016A000212 7560
 
7.4%
2016A000213 7560
 
7.4%
2016A000224 7560
 
7.4%
2016A000235 7560
 
7.4%
2016A000246 7560
 
7.4%
2016A000247 7560
 
7.4%
2016A000248 7560
 
7.4%
2016A000259 7560
 
7.4%
2016A000210 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-11-28T15:52:14.827308image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.4%
2016a000212 7560
 
7.4%
2016a000213 7560
 
7.4%
2016a000224 7560
 
7.4%
2016a000235 7560
 
7.4%
2016a000246 7560
 
7.4%
2016a000247 7560
 
7.4%
2016a000248 7560
 
7.4%
2016a000259 7560
 
7.4%
2016a000210 7224
 
7.0%
Other values (4) 27552
26.8%

Most occurring characters

ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1050840
91.1%
Uppercase Letter 102816
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 102816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1050840
91.1%
Latin 102816
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 102816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1153656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

GEO
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Canada
7560 
Nova Scotia
7560 
New Brunswick
7560 
Quebec
7560 
Ontario
7560 
Other values (9)
65016 

Length

Max length25
Median length16
Mean length11.72549
Min length5

Characters and Unicode

Total characters1205568
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.4%
Nova Scotia 7560
 
7.4%
New Brunswick 7560
 
7.4%
Quebec 7560
 
7.4%
Ontario 7560
 
7.4%
Manitoba 7560
 
7.4%
Saskatchewan 7560
 
7.4%
Alberta 7560
 
7.4%
British Columbia 7560
 
7.4%
Newfoundland and Labrador 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-11-28T15:52:15.129273image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.7%
manitoba 7560
 
4.7%
nova 7560
 
4.7%
british 7560
 
4.7%
alberta 7560
 
4.7%
saskatchewan 7560
 
4.7%
columbia 7560
 
4.7%
ontario 7560
 
4.7%
quebec 7560
 
4.7%
brunswick 7560
 
4.7%
Other values (12) 86016
53.2%

Most occurring characters

ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 992376
82.3%
Uppercase Letter 154392
 
12.8%
Space Separator 58800
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 148176
14.9%
r 88032
 
8.9%
n 87024
 
8.8%
i 74592
 
7.5%
e 73920
 
7.4%
t 73584
 
7.4%
o 73248
 
7.4%
d 58128
 
5.9%
u 49560
 
5.0%
w 44352
 
4.5%
Other values (9) 221760
22.3%
Uppercase Letter
ValueCountFrequency (%)
N 36120
23.4%
C 15120
9.8%
B 15120
9.8%
S 15120
9.8%
Q 7560
 
4.9%
O 7560
 
4.9%
M 7560
 
4.9%
A 7560
 
4.9%
L 7224
 
4.7%
P 7224
 
4.7%
Other values (4) 28224
18.3%
Space Separator
ValueCountFrequency (%)
58800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1146768
95.1%
Common 58800
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 148176
 
12.9%
r 88032
 
7.7%
n 87024
 
7.6%
i 74592
 
6.5%
e 73920
 
6.4%
t 73584
 
6.4%
o 73248
 
6.4%
d 58128
 
5.1%
u 49560
 
4.3%
w 44352
 
3.9%
Other values (23) 376152
32.8%
Common
ValueCountFrequency (%)
58800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1205568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Sector
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Government non-profit institutions
21168 
Non-profit institutions serving households (community organizations)
19656 
Business non-profit institutions
19656 

Length

Max length68
Median length34
Mean length42.588235
Min length29

Characters and Unicode

Total characters4378752
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.6%
Total non-profit institutions excluding governments 21168
20.6%
Government non-profit institutions 21168
20.6%
Non-profit institutions serving households (community organizations) 19656
19.1%
Business non-profit institutions 19656
19.1%

Length

2023-11-28T15:52:15.511971image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:15.761034image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 102816
25.1%
institutions 102816
25.1%
total 42336
10.3%
excluding 21168
 
5.2%
governments 21168
 
5.2%
government 21168
 
5.2%
serving 19656
 
4.8%
households 19656
 
4.8%
community 19656
 
4.8%
organizations 19656
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3826872
87.4%
Space Separator 306936
 
7.0%
Dash Punctuation 102816
 
2.3%
Uppercase Letter 102816
 
2.3%
Open Punctuation 19656
 
0.4%
Close Punctuation 19656
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 595728
15.6%
t 535248
14.0%
i 530712
13.9%
o 491400
12.8%
s 364392
9.5%
r 184464
 
4.8%
u 182952
 
4.8%
e 164808
 
4.3%
f 102816
 
2.7%
p 102816
 
2.7%
Other values (11) 571536
14.9%
Uppercase Letter
ValueCountFrequency (%)
T 42336
41.2%
G 21168
20.6%
N 19656
19.1%
B 19656
19.1%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19656
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3929688
89.7%
Common 449064
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 595728
15.2%
t 535248
13.6%
i 530712
13.5%
o 491400
12.5%
s 364392
9.3%
r 184464
 
4.7%
u 182952
 
4.7%
e 164808
 
4.2%
f 102816
 
2.6%
p 102816
 
2.6%
Other values (15) 674352
17.2%
Common
ValueCountFrequency (%)
306936
68.4%
- 102816
 
22.9%
( 19656
 
4.4%
) 19656
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4378752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Characteristics
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Male employees
 
5880
Not a visible minority
 
5880
55 to 64 years
 
5880
Female employees
 
5880
High school diploma and less
 
5880
Other values (13)
73416 

Length

Max length33
Median length28
Mean length19.415033
Min length14

Characters and Unicode

Total characters1996176
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.7%
Not a visible minority 5880
 
5.7%
55 to 64 years 5880
 
5.7%
Female employees 5880
 
5.7%
High school diploma and less 5880
 
5.7%
College diploma 5880
 
5.7%
Visible minority 5880
 
5.7%
25 to 34 years 5880
 
5.7%
35 to 44 years 5880
 
5.7%
45 to 54 years 5880
 
5.7%
Other values (8) 44016
42.8%

Length

2023-11-28T15:52:16.058609image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 34272
 
10.4%
employees 33936
 
10.2%
to 28896
 
8.7%
and 16800
 
5.1%
visible 11760
 
3.6%
minority 11760
 
3.6%
diploma 11760
 
3.6%
identity 11088
 
3.3%
male 5880
 
1.8%
college 5880
 
1.8%
Other values (28) 159096
48.0%

Most occurring characters

ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1561896
78.2%
Space Separator 228312
 
11.4%
Decimal Number 126336
 
6.3%
Uppercase Letter 68544
 
3.4%
Dash Punctuation 11088
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255024
16.3%
i 147840
9.5%
o 142800
9.1%
s 114240
 
7.3%
a 102648
 
6.6%
l 98112
 
6.3%
y 96600
 
6.2%
t 96432
 
6.2%
r 90216
 
5.8%
n 89544
 
5.7%
Other values (10) 328440
21.0%
Uppercase Letter
ValueCountFrequency (%)
N 16968
24.8%
I 11088
16.2%
M 5880
 
8.6%
V 5880
 
8.6%
C 5880
 
8.6%
H 5880
 
8.6%
F 5880
 
8.6%
U 5544
 
8.1%
T 5544
 
8.1%
Decimal Number
ValueCountFrequency (%)
5 46032
36.4%
4 40656
32.2%
3 11760
 
9.3%
2 11256
 
8.9%
6 11256
 
8.9%
1 5376
 
4.3%
Space Separator
ValueCountFrequency (%)
228312
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1630440
81.7%
Common 365736
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255024
15.6%
i 147840
 
9.1%
o 142800
 
8.8%
s 114240
 
7.0%
a 102648
 
6.3%
l 98112
 
6.0%
y 96600
 
5.9%
t 96432
 
5.9%
r 90216
 
5.5%
n 89544
 
5.5%
Other values (19) 396984
24.3%
Common
ValueCountFrequency (%)
228312
62.4%
5 46032
 
12.6%
4 40656
 
11.1%
3 11760
 
3.2%
2 11256
 
3.1%
6 11256
 
3.1%
- 11088
 
3.0%
1 5376
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1996176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Number of jobs
14688 
Hours worked
14688 
Wages and salaries
14688 
Average annual hours worked
14688 
Average weekly hours worked
14688 
Other values (2)
29376 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2203200
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 14688
14.3%
Hours worked 14688
14.3%
Wages and salaries 14688
14.3%
Average annual hours worked 14688
14.3%
Average weekly hours worked 14688
14.3%
Average annual wages and salaries 14688
14.3%
Average hourly wage 14688
14.3%

Length

2023-11-28T15:52:16.300468image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:16.516356image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 58752
16.7%
hours 44064
12.5%
worked 44064
12.5%
wages 29376
8.3%
and 29376
8.3%
salaries 29376
8.3%
annual 29376
8.3%
number 14688
 
4.2%
of 14688
 
4.2%
jobs 14688
 
4.2%
Other values (3) 44064
12.5%

Most occurring characters

ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1850688
84.0%
Space Separator 249696
 
11.3%
Uppercase Letter 102816
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279072
15.1%
a 249696
13.5%
r 205632
11.1%
s 146880
 
7.9%
o 132192
 
7.1%
u 102816
 
5.6%
g 102816
 
5.6%
n 88128
 
4.8%
l 88128
 
4.8%
w 88128
 
4.8%
Other values (10) 367200
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 58752
57.1%
W 14688
 
14.3%
H 14688
 
14.3%
N 14688
 
14.3%
Space Separator
ValueCountFrequency (%)
249696
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1953504
88.7%
Common 249696
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279072
14.3%
a 249696
12.8%
r 205632
10.5%
s 146880
 
7.5%
o 132192
 
6.8%
u 102816
 
5.3%
g 102816
 
5.3%
n 88128
 
4.5%
l 88128
 
4.5%
w 88128
 
4.5%
Other values (14) 470016
24.1%
Common
ValueCountFrequency (%)
249696
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2203200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Hours
44064 
Dollars
44064 
Jobs
14688 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters587520
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 44064
42.9%
Dollars 44064
42.9%
Jobs 14688
 
14.3%

Length

2023-11-28T15:52:16.790795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:16.991056image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 44064
42.9%
dollars 44064
42.9%
jobs 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484704
82.5%
Uppercase Letter 102816
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 102816
21.2%
s 102816
21.2%
r 88128
18.2%
l 88128
18.2%
u 44064
9.1%
a 44064
9.1%
b 14688
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 44064
42.9%
D 44064
42.9%
J 14688
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 587520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 587520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
units
73440 
thousands
14688 
millions
14688 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters616896
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Length

2023-11-28T15:52:17.261381image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-11-28T15:52:17.511696image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616896
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 616896
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 616896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

VALUE
Real number (ℝ)

Distinct35398
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-11-28T15:52:17.759493image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-11-28T15:52:18.070181image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
2.0%
31 1949
 
1.9%
32 1844
 
1.8%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
1.0%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
87.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.2%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-11-28T15:52:13.077893image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:52:12.633017image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:52:13.307246image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-11-28T15:52:12.827289image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-11-28T15:52:18.285959image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0290.0230.0000.0000.000
GEO0.0000.0901.0001.0000.0290.0230.0000.0000.000
Sector0.0000.0510.0290.0291.0000.0180.0000.0000.000
Characteristics0.0000.0440.0230.0230.0181.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-11-28T15:52:13.635533image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-11-28T15:52:13.991020image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columned Sorted and NA Removed

Overview

Dataset statistics

Number of variables9
Number of observations102816
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory80.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 15:26:45.881475
Analysis finished2023-12-20 15:26:50.765458
Duration4.88 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T10:26:50.878795image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520693
Coefficient of variation (CV)0.0017127608
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.0722565 × 108
Variance11.916783
MonotonicityIncreasing
2023-12-20T10:26:51.115323image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
Other values (2) 17136
16.7%
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
ValueCountFrequency (%)
2021 8568
8.3%
2020 8568
8.3%
2019 8568
8.3%
2018 8568
8.3%
2017 8568
8.3%
2016 8568
8.3%
2015 8568
8.3%
2014 8568
8.3%
2013 8568
8.3%
2012 8568
8.3%

DGUID
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
2016A000011124
7560 
2016A000212
7560 
2016A000213
7560 
2016A000224
7560 
2016A000235
7560 
Other values (9)
65016 

Length

Max length14
Median length11
Mean length11.220588
Min length11

Characters and Unicode

Total characters1153656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.4%
2016A000212 7560
 
7.4%
2016A000213 7560
 
7.4%
2016A000224 7560
 
7.4%
2016A000235 7560
 
7.4%
2016A000246 7560
 
7.4%
2016A000247 7560
 
7.4%
2016A000248 7560
 
7.4%
2016A000259 7560
 
7.4%
2016A000210 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T10:26:51.371073image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.4%
2016a000212 7560
 
7.4%
2016a000213 7560
 
7.4%
2016a000224 7560
 
7.4%
2016a000235 7560
 
7.4%
2016a000246 7560
 
7.4%
2016a000247 7560
 
7.4%
2016a000248 7560
 
7.4%
2016a000259 7560
 
7.4%
2016a000210 7224
 
7.0%
Other values (4) 27552
26.8%

Most occurring characters

ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1050840
91.1%
Uppercase Letter 102816
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 102816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1050840
91.1%
Latin 102816
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 102816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1153656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

GEO
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Canada
7560 
Nova Scotia
7560 
New Brunswick
7560 
Quebec
7560 
Ontario
7560 
Other values (9)
65016 

Length

Max length25
Median length16
Mean length11.72549
Min length5

Characters and Unicode

Total characters1205568
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.4%
Nova Scotia 7560
 
7.4%
New Brunswick 7560
 
7.4%
Quebec 7560
 
7.4%
Ontario 7560
 
7.4%
Manitoba 7560
 
7.4%
Saskatchewan 7560
 
7.4%
Alberta 7560
 
7.4%
British Columbia 7560
 
7.4%
Newfoundland and Labrador 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T10:26:51.658702image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.7%
manitoba 7560
 
4.7%
nova 7560
 
4.7%
british 7560
 
4.7%
alberta 7560
 
4.7%
saskatchewan 7560
 
4.7%
columbia 7560
 
4.7%
ontario 7560
 
4.7%
quebec 7560
 
4.7%
brunswick 7560
 
4.7%
Other values (12) 86016
53.2%

Most occurring characters

ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 992376
82.3%
Uppercase Letter 154392
 
12.8%
Space Separator 58800
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 148176
14.9%
r 88032
 
8.9%
n 87024
 
8.8%
i 74592
 
7.5%
e 73920
 
7.4%
t 73584
 
7.4%
o 73248
 
7.4%
d 58128
 
5.9%
u 49560
 
5.0%
w 44352
 
4.5%
Other values (9) 221760
22.3%
Uppercase Letter
ValueCountFrequency (%)
N 36120
23.4%
C 15120
9.8%
B 15120
9.8%
S 15120
9.8%
Q 7560
 
4.9%
O 7560
 
4.9%
M 7560
 
4.9%
A 7560
 
4.9%
L 7224
 
4.7%
P 7224
 
4.7%
Other values (4) 28224
18.3%
Space Separator
ValueCountFrequency (%)
58800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1146768
95.1%
Common 58800
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 148176
 
12.9%
r 88032
 
7.7%
n 87024
 
7.6%
i 74592
 
6.5%
e 73920
 
6.4%
t 73584
 
6.4%
o 73248
 
6.4%
d 58128
 
5.1%
u 49560
 
4.3%
w 44352
 
3.9%
Other values (23) 376152
32.8%
Common
ValueCountFrequency (%)
58800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1205568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Sector
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Government non-profit institutions
21168 
Non-profit institutions serving households (community organizations)
19656 
Business non-profit institutions
19656 

Length

Max length68
Median length34
Mean length42.588235
Min length29

Characters and Unicode

Total characters4378752
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.6%
Total non-profit institutions excluding governments 21168
20.6%
Government non-profit institutions 21168
20.6%
Non-profit institutions serving households (community organizations) 19656
19.1%
Business non-profit institutions 19656
19.1%

Length

2023-12-20T10:26:51.964957image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:52.215152image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 102816
25.1%
institutions 102816
25.1%
total 42336
10.3%
excluding 21168
 
5.2%
governments 21168
 
5.2%
government 21168
 
5.2%
serving 19656
 
4.8%
households 19656
 
4.8%
community 19656
 
4.8%
organizations 19656
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3826872
87.4%
Space Separator 306936
 
7.0%
Dash Punctuation 102816
 
2.3%
Uppercase Letter 102816
 
2.3%
Open Punctuation 19656
 
0.4%
Close Punctuation 19656
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 595728
15.6%
t 535248
14.0%
i 530712
13.9%
o 491400
12.8%
s 364392
9.5%
r 184464
 
4.8%
u 182952
 
4.8%
e 164808
 
4.3%
f 102816
 
2.7%
p 102816
 
2.7%
Other values (11) 571536
14.9%
Uppercase Letter
ValueCountFrequency (%)
T 42336
41.2%
G 21168
20.6%
N 19656
19.1%
B 19656
19.1%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19656
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3929688
89.7%
Common 449064
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 595728
15.2%
t 535248
13.6%
i 530712
13.5%
o 491400
12.5%
s 364392
9.3%
r 184464
 
4.7%
u 182952
 
4.7%
e 164808
 
4.2%
f 102816
 
2.6%
p 102816
 
2.6%
Other values (15) 674352
17.2%
Common
ValueCountFrequency (%)
306936
68.4%
- 102816
 
22.9%
( 19656
 
4.4%
) 19656
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4378752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Characteristics
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Male employees
 
5880
Not a visible minority
 
5880
55 to 64 years
 
5880
Female employees
 
5880
High school diploma and less
 
5880
Other values (13)
73416 

Length

Max length33
Median length28
Mean length19.415033
Min length14

Characters and Unicode

Total characters1996176
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.7%
Not a visible minority 5880
 
5.7%
55 to 64 years 5880
 
5.7%
Female employees 5880
 
5.7%
High school diploma and less 5880
 
5.7%
College diploma 5880
 
5.7%
Visible minority 5880
 
5.7%
25 to 34 years 5880
 
5.7%
35 to 44 years 5880
 
5.7%
45 to 54 years 5880
 
5.7%
Other values (8) 44016
42.8%

Length

2023-12-20T10:26:52.501050image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 34272
 
10.4%
employees 33936
 
10.2%
to 28896
 
8.7%
and 16800
 
5.1%
visible 11760
 
3.6%
minority 11760
 
3.6%
diploma 11760
 
3.6%
identity 11088
 
3.3%
male 5880
 
1.8%
college 5880
 
1.8%
Other values (28) 159096
48.0%

Most occurring characters

ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1561896
78.2%
Space Separator 228312
 
11.4%
Decimal Number 126336
 
6.3%
Uppercase Letter 68544
 
3.4%
Dash Punctuation 11088
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255024
16.3%
i 147840
9.5%
o 142800
9.1%
s 114240
 
7.3%
a 102648
 
6.6%
l 98112
 
6.3%
y 96600
 
6.2%
t 96432
 
6.2%
r 90216
 
5.8%
n 89544
 
5.7%
Other values (10) 328440
21.0%
Uppercase Letter
ValueCountFrequency (%)
N 16968
24.8%
I 11088
16.2%
M 5880
 
8.6%
V 5880
 
8.6%
C 5880
 
8.6%
H 5880
 
8.6%
F 5880
 
8.6%
U 5544
 
8.1%
T 5544
 
8.1%
Decimal Number
ValueCountFrequency (%)
5 46032
36.4%
4 40656
32.2%
3 11760
 
9.3%
2 11256
 
8.9%
6 11256
 
8.9%
1 5376
 
4.3%
Space Separator
ValueCountFrequency (%)
228312
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1630440
81.7%
Common 365736
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255024
15.6%
i 147840
 
9.1%
o 142800
 
8.8%
s 114240
 
7.0%
a 102648
 
6.3%
l 98112
 
6.0%
y 96600
 
5.9%
t 96432
 
5.9%
r 90216
 
5.5%
n 89544
 
5.5%
Other values (19) 396984
24.3%
Common
ValueCountFrequency (%)
228312
62.4%
5 46032
 
12.6%
4 40656
 
11.1%
3 11760
 
3.2%
2 11256
 
3.1%
6 11256
 
3.1%
- 11088
 
3.0%
1 5376
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1996176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Number of jobs
14688 
Hours worked
14688 
Wages and salaries
14688 
Average annual hours worked
14688 
Average weekly hours worked
14688 
Other values (2)
29376 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2203200
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 14688
14.3%
Hours worked 14688
14.3%
Wages and salaries 14688
14.3%
Average annual hours worked 14688
14.3%
Average weekly hours worked 14688
14.3%
Average annual wages and salaries 14688
14.3%
Average hourly wage 14688
14.3%

Length

2023-12-20T10:26:52.766525image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:52.975159image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 58752
16.7%
hours 44064
12.5%
worked 44064
12.5%
wages 29376
8.3%
and 29376
8.3%
salaries 29376
8.3%
annual 29376
8.3%
number 14688
 
4.2%
of 14688
 
4.2%
jobs 14688
 
4.2%
Other values (3) 44064
12.5%

Most occurring characters

ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1850688
84.0%
Space Separator 249696
 
11.3%
Uppercase Letter 102816
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279072
15.1%
a 249696
13.5%
r 205632
11.1%
s 146880
 
7.9%
o 132192
 
7.1%
u 102816
 
5.6%
g 102816
 
5.6%
n 88128
 
4.8%
l 88128
 
4.8%
w 88128
 
4.8%
Other values (10) 367200
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 58752
57.1%
W 14688
 
14.3%
H 14688
 
14.3%
N 14688
 
14.3%
Space Separator
ValueCountFrequency (%)
249696
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1953504
88.7%
Common 249696
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279072
14.3%
a 249696
12.8%
r 205632
10.5%
s 146880
 
7.5%
o 132192
 
6.8%
u 102816
 
5.3%
g 102816
 
5.3%
n 88128
 
4.5%
l 88128
 
4.5%
w 88128
 
4.5%
Other values (14) 470016
24.1%
Common
ValueCountFrequency (%)
249696
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2203200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Hours
44064 
Dollars
44064 
Jobs
14688 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters587520
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 44064
42.9%
Dollars 44064
42.9%
Jobs 14688
 
14.3%

Length

2023-12-20T10:26:53.298061image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:53.487788image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 44064
42.9%
dollars 44064
42.9%
jobs 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484704
82.5%
Uppercase Letter 102816
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 102816
21.2%
s 102816
21.2%
r 88128
18.2%
l 88128
18.2%
u 44064
9.1%
a 44064
9.1%
b 14688
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 44064
42.9%
D 44064
42.9%
J 14688
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 587520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 587520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
units
73440 
thousands
14688 
millions
14688 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters616896
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Length

2023-12-20T10:26:53.737495image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:26:53.955395image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616896
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 616896
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 616896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

VALUE
Real number (ℝ)

Distinct35398
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T10:26:54.200820image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:26:54.517541image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
2.0%
31 1949
 
1.9%
32 1844
 
1.8%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
1.0%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
87.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.2%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T10:26:49.579276image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:49.111247image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:49.794976image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:26:49.357588image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:26:54.736229image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0290.0230.0000.0000.000
GEO0.0000.0901.0001.0000.0290.0230.0000.0000.000
Sector0.0000.0510.0290.0291.0000.0180.0000.0000.000
Characteristics0.0000.0440.0230.0230.0181.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T10:26:50.122832image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:26:50.499115image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columned Sorted and NA Removed

Overview

Dataset statistics

Number of variables9
Number of observations102816
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory80.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 15:45:26.720464
Analysis finished2023-12-20 15:45:31.257847
Duration4.54 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T10:45:31.380518image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520693
Coefficient of variation (CV)0.0017127608
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.0722565 × 108
Variance11.916783
MonotonicityIncreasing
2023-12-20T10:45:31.590618image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
Other values (2) 17136
16.7%
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
ValueCountFrequency (%)
2021 8568
8.3%
2020 8568
8.3%
2019 8568
8.3%
2018 8568
8.3%
2017 8568
8.3%
2016 8568
8.3%
2015 8568
8.3%
2014 8568
8.3%
2013 8568
8.3%
2012 8568
8.3%

DGUID
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
2016A000011124
7560 
2016A000212
7560 
2016A000213
7560 
2016A000224
7560 
2016A000235
7560 
Other values (9)
65016 

Length

Max length14
Median length11
Mean length11.220588
Min length11

Characters and Unicode

Total characters1153656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.4%
2016A000212 7560
 
7.4%
2016A000213 7560
 
7.4%
2016A000224 7560
 
7.4%
2016A000235 7560
 
7.4%
2016A000246 7560
 
7.4%
2016A000247 7560
 
7.4%
2016A000248 7560
 
7.4%
2016A000259 7560
 
7.4%
2016A000210 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T10:45:31.821836image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.4%
2016a000212 7560
 
7.4%
2016a000213 7560
 
7.4%
2016a000224 7560
 
7.4%
2016a000235 7560
 
7.4%
2016a000246 7560
 
7.4%
2016a000247 7560
 
7.4%
2016a000248 7560
 
7.4%
2016a000259 7560
 
7.4%
2016a000210 7224
 
7.0%
Other values (4) 27552
26.8%

Most occurring characters

ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1050840
91.1%
Uppercase Letter 102816
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 102816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1050840
91.1%
Latin 102816
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 102816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1153656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

GEO
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Canada
7560 
Nova Scotia
7560 
New Brunswick
7560 
Quebec
7560 
Ontario
7560 
Other values (9)
65016 

Length

Max length25
Median length16
Mean length11.72549
Min length5

Characters and Unicode

Total characters1205568
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.4%
Nova Scotia 7560
 
7.4%
New Brunswick 7560
 
7.4%
Quebec 7560
 
7.4%
Ontario 7560
 
7.4%
Manitoba 7560
 
7.4%
Saskatchewan 7560
 
7.4%
Alberta 7560
 
7.4%
British Columbia 7560
 
7.4%
Newfoundland and Labrador 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T10:45:32.123996image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.7%
manitoba 7560
 
4.7%
nova 7560
 
4.7%
british 7560
 
4.7%
alberta 7560
 
4.7%
saskatchewan 7560
 
4.7%
columbia 7560
 
4.7%
ontario 7560
 
4.7%
quebec 7560
 
4.7%
brunswick 7560
 
4.7%
Other values (12) 86016
53.2%

Most occurring characters

ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 992376
82.3%
Uppercase Letter 154392
 
12.8%
Space Separator 58800
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 148176
14.9%
r 88032
 
8.9%
n 87024
 
8.8%
i 74592
 
7.5%
e 73920
 
7.4%
t 73584
 
7.4%
o 73248
 
7.4%
d 58128
 
5.9%
u 49560
 
5.0%
w 44352
 
4.5%
Other values (9) 221760
22.3%
Uppercase Letter
ValueCountFrequency (%)
N 36120
23.4%
C 15120
9.8%
B 15120
9.8%
S 15120
9.8%
Q 7560
 
4.9%
O 7560
 
4.9%
M 7560
 
4.9%
A 7560
 
4.9%
L 7224
 
4.7%
P 7224
 
4.7%
Other values (4) 28224
18.3%
Space Separator
ValueCountFrequency (%)
58800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1146768
95.1%
Common 58800
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 148176
 
12.9%
r 88032
 
7.7%
n 87024
 
7.6%
i 74592
 
6.5%
e 73920
 
6.4%
t 73584
 
6.4%
o 73248
 
6.4%
d 58128
 
5.1%
u 49560
 
4.3%
w 44352
 
3.9%
Other values (23) 376152
32.8%
Common
ValueCountFrequency (%)
58800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1205568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Sector
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Government non-profit institutions
21168 
Non-profit institutions serving households (community organizations)
19656 
Business non-profit institutions
19656 

Length

Max length68
Median length34
Mean length42.588235
Min length29

Characters and Unicode

Total characters4378752
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.6%
Total non-profit institutions excluding governments 21168
20.6%
Government non-profit institutions 21168
20.6%
Non-profit institutions serving households (community organizations) 19656
19.1%
Business non-profit institutions 19656
19.1%

Length

2023-12-20T10:45:32.479105image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:32.705410image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 102816
25.1%
institutions 102816
25.1%
total 42336
10.3%
excluding 21168
 
5.2%
governments 21168
 
5.2%
government 21168
 
5.2%
serving 19656
 
4.8%
households 19656
 
4.8%
community 19656
 
4.8%
organizations 19656
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3826872
87.4%
Space Separator 306936
 
7.0%
Dash Punctuation 102816
 
2.3%
Uppercase Letter 102816
 
2.3%
Open Punctuation 19656
 
0.4%
Close Punctuation 19656
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 595728
15.6%
t 535248
14.0%
i 530712
13.9%
o 491400
12.8%
s 364392
9.5%
r 184464
 
4.8%
u 182952
 
4.8%
e 164808
 
4.3%
f 102816
 
2.7%
p 102816
 
2.7%
Other values (11) 571536
14.9%
Uppercase Letter
ValueCountFrequency (%)
T 42336
41.2%
G 21168
20.6%
N 19656
19.1%
B 19656
19.1%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19656
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3929688
89.7%
Common 449064
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 595728
15.2%
t 535248
13.6%
i 530712
13.5%
o 491400
12.5%
s 364392
9.3%
r 184464
 
4.7%
u 182952
 
4.7%
e 164808
 
4.2%
f 102816
 
2.6%
p 102816
 
2.6%
Other values (15) 674352
17.2%
Common
ValueCountFrequency (%)
306936
68.4%
- 102816
 
22.9%
( 19656
 
4.4%
) 19656
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4378752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Characteristics
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Male employees
 
5880
Not a visible minority
 
5880
55 to 64 years
 
5880
Female employees
 
5880
High school diploma and less
 
5880
Other values (13)
73416 

Length

Max length33
Median length28
Mean length19.415033
Min length14

Characters and Unicode

Total characters1996176
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.7%
Not a visible minority 5880
 
5.7%
55 to 64 years 5880
 
5.7%
Female employees 5880
 
5.7%
High school diploma and less 5880
 
5.7%
College diploma 5880
 
5.7%
Visible minority 5880
 
5.7%
25 to 34 years 5880
 
5.7%
35 to 44 years 5880
 
5.7%
45 to 54 years 5880
 
5.7%
Other values (8) 44016
42.8%

Length

2023-12-20T10:45:33.026348image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 34272
 
10.4%
employees 33936
 
10.2%
to 28896
 
8.7%
and 16800
 
5.1%
visible 11760
 
3.6%
minority 11760
 
3.6%
diploma 11760
 
3.6%
identity 11088
 
3.3%
male 5880
 
1.8%
college 5880
 
1.8%
Other values (28) 159096
48.0%

Most occurring characters

ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1561896
78.2%
Space Separator 228312
 
11.4%
Decimal Number 126336
 
6.3%
Uppercase Letter 68544
 
3.4%
Dash Punctuation 11088
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255024
16.3%
i 147840
9.5%
o 142800
9.1%
s 114240
 
7.3%
a 102648
 
6.6%
l 98112
 
6.3%
y 96600
 
6.2%
t 96432
 
6.2%
r 90216
 
5.8%
n 89544
 
5.7%
Other values (10) 328440
21.0%
Uppercase Letter
ValueCountFrequency (%)
N 16968
24.8%
I 11088
16.2%
M 5880
 
8.6%
V 5880
 
8.6%
C 5880
 
8.6%
H 5880
 
8.6%
F 5880
 
8.6%
U 5544
 
8.1%
T 5544
 
8.1%
Decimal Number
ValueCountFrequency (%)
5 46032
36.4%
4 40656
32.2%
3 11760
 
9.3%
2 11256
 
8.9%
6 11256
 
8.9%
1 5376
 
4.3%
Space Separator
ValueCountFrequency (%)
228312
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1630440
81.7%
Common 365736
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255024
15.6%
i 147840
 
9.1%
o 142800
 
8.8%
s 114240
 
7.0%
a 102648
 
6.3%
l 98112
 
6.0%
y 96600
 
5.9%
t 96432
 
5.9%
r 90216
 
5.5%
n 89544
 
5.5%
Other values (19) 396984
24.3%
Common
ValueCountFrequency (%)
228312
62.4%
5 46032
 
12.6%
4 40656
 
11.1%
3 11760
 
3.2%
2 11256
 
3.1%
6 11256
 
3.1%
- 11088
 
3.0%
1 5376
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1996176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Number of jobs
14688 
Hours worked
14688 
Wages and salaries
14688 
Average annual hours worked
14688 
Average weekly hours worked
14688 
Other values (2)
29376 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2203200
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 14688
14.3%
Hours worked 14688
14.3%
Wages and salaries 14688
14.3%
Average annual hours worked 14688
14.3%
Average weekly hours worked 14688
14.3%
Average annual wages and salaries 14688
14.3%
Average hourly wage 14688
14.3%

Length

2023-12-20T10:45:33.265527image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:33.490551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 58752
16.7%
hours 44064
12.5%
worked 44064
12.5%
wages 29376
8.3%
and 29376
8.3%
salaries 29376
8.3%
annual 29376
8.3%
number 14688
 
4.2%
of 14688
 
4.2%
jobs 14688
 
4.2%
Other values (3) 44064
12.5%

Most occurring characters

ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1850688
84.0%
Space Separator 249696
 
11.3%
Uppercase Letter 102816
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279072
15.1%
a 249696
13.5%
r 205632
11.1%
s 146880
 
7.9%
o 132192
 
7.1%
u 102816
 
5.6%
g 102816
 
5.6%
n 88128
 
4.8%
l 88128
 
4.8%
w 88128
 
4.8%
Other values (10) 367200
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 58752
57.1%
W 14688
 
14.3%
H 14688
 
14.3%
N 14688
 
14.3%
Space Separator
ValueCountFrequency (%)
249696
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1953504
88.7%
Common 249696
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279072
14.3%
a 249696
12.8%
r 205632
10.5%
s 146880
 
7.5%
o 132192
 
6.8%
u 102816
 
5.3%
g 102816
 
5.3%
n 88128
 
4.5%
l 88128
 
4.5%
w 88128
 
4.5%
Other values (14) 470016
24.1%
Common
ValueCountFrequency (%)
249696
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2203200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Hours
44064 
Dollars
44064 
Jobs
14688 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters587520
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 44064
42.9%
Dollars 44064
42.9%
Jobs 14688
 
14.3%

Length

2023-12-20T10:45:33.796576image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:33.991344image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 44064
42.9%
dollars 44064
42.9%
jobs 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484704
82.5%
Uppercase Letter 102816
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 102816
21.2%
s 102816
21.2%
r 88128
18.2%
l 88128
18.2%
u 44064
9.1%
a 44064
9.1%
b 14688
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 44064
42.9%
D 44064
42.9%
J 14688
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 587520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 587520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
units
73440 
thousands
14688 
millions
14688 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters616896
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Length

2023-12-20T10:45:34.232945image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:45:34.454232image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616896
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 616896
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 616896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

VALUE
Real number (ℝ)

Distinct35398
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T10:45:34.693093image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:45:35.005616image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
2.0%
31 1949
 
1.9%
32 1844
 
1.8%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
1.0%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
87.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.2%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T10:45:30.030216image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:29.612363image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:30.254769image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:45:29.806041image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:45:35.220424image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0290.0230.0000.0000.000
GEO0.0000.0901.0001.0000.0290.0230.0000.0000.000
Sector0.0000.0510.0290.0291.0000.0180.0000.0000.000
Characteristics0.0000.0440.0230.0230.0181.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T10:45:30.596294image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:45:30.981165image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columned Sorted and NA Removed

Overview

Dataset statistics

Number of variables9
Number of observations102816
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory80.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 15:51:11.900713
Analysis finished2023-12-20 15:51:23.087595
Duration11.19 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T10:51:23.392021image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520693
Coefficient of variation (CV)0.0017127608
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.0722565 × 108
Variance11.916783
MonotonicityIncreasing
2023-12-20T10:51:23.932497image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
Other values (2) 17136
16.7%
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
ValueCountFrequency (%)
2021 8568
8.3%
2020 8568
8.3%
2019 8568
8.3%
2018 8568
8.3%
2017 8568
8.3%
2016 8568
8.3%
2015 8568
8.3%
2014 8568
8.3%
2013 8568
8.3%
2012 8568
8.3%

DGUID
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
2016A000011124
7560 
2016A000212
7560 
2016A000213
7560 
2016A000224
7560 
2016A000235
7560 
Other values (9)
65016 

Length

Max length14
Median length11
Mean length11.220588
Min length11

Characters and Unicode

Total characters1153656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.4%
2016A000212 7560
 
7.4%
2016A000213 7560
 
7.4%
2016A000224 7560
 
7.4%
2016A000235 7560
 
7.4%
2016A000246 7560
 
7.4%
2016A000247 7560
 
7.4%
2016A000248 7560
 
7.4%
2016A000259 7560
 
7.4%
2016A000210 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T10:51:24.556145image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.4%
2016a000212 7560
 
7.4%
2016a000213 7560
 
7.4%
2016a000224 7560
 
7.4%
2016a000235 7560
 
7.4%
2016a000246 7560
 
7.4%
2016a000247 7560
 
7.4%
2016a000248 7560
 
7.4%
2016a000259 7560
 
7.4%
2016a000210 7224
 
7.0%
Other values (4) 27552
26.8%

Most occurring characters

ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1050840
91.1%
Uppercase Letter 102816
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 102816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1050840
91.1%
Latin 102816
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 102816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1153656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

GEO
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Canada
7560 
Nova Scotia
7560 
New Brunswick
7560 
Quebec
7560 
Ontario
7560 
Other values (9)
65016 

Length

Max length25
Median length16
Mean length11.72549
Min length5

Characters and Unicode

Total characters1205568
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.4%
Nova Scotia 7560
 
7.4%
New Brunswick 7560
 
7.4%
Quebec 7560
 
7.4%
Ontario 7560
 
7.4%
Manitoba 7560
 
7.4%
Saskatchewan 7560
 
7.4%
Alberta 7560
 
7.4%
British Columbia 7560
 
7.4%
Newfoundland and Labrador 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T10:51:25.229486image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.7%
manitoba 7560
 
4.7%
nova 7560
 
4.7%
british 7560
 
4.7%
alberta 7560
 
4.7%
saskatchewan 7560
 
4.7%
columbia 7560
 
4.7%
ontario 7560
 
4.7%
quebec 7560
 
4.7%
brunswick 7560
 
4.7%
Other values (12) 86016
53.2%

Most occurring characters

ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 992376
82.3%
Uppercase Letter 154392
 
12.8%
Space Separator 58800
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 148176
14.9%
r 88032
 
8.9%
n 87024
 
8.8%
i 74592
 
7.5%
e 73920
 
7.4%
t 73584
 
7.4%
o 73248
 
7.4%
d 58128
 
5.9%
u 49560
 
5.0%
w 44352
 
4.5%
Other values (9) 221760
22.3%
Uppercase Letter
ValueCountFrequency (%)
N 36120
23.4%
C 15120
9.8%
B 15120
9.8%
S 15120
9.8%
Q 7560
 
4.9%
O 7560
 
4.9%
M 7560
 
4.9%
A 7560
 
4.9%
L 7224
 
4.7%
P 7224
 
4.7%
Other values (4) 28224
18.3%
Space Separator
ValueCountFrequency (%)
58800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1146768
95.1%
Common 58800
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 148176
 
12.9%
r 88032
 
7.7%
n 87024
 
7.6%
i 74592
 
6.5%
e 73920
 
6.4%
t 73584
 
6.4%
o 73248
 
6.4%
d 58128
 
5.1%
u 49560
 
4.3%
w 44352
 
3.9%
Other values (23) 376152
32.8%
Common
ValueCountFrequency (%)
58800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1205568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Sector
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Government non-profit institutions
21168 
Non-profit institutions serving households (community organizations)
19656 
Business non-profit institutions
19656 

Length

Max length68
Median length34
Mean length42.588235
Min length29

Characters and Unicode

Total characters4378752
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.6%
Total non-profit institutions excluding governments 21168
20.6%
Government non-profit institutions 21168
20.6%
Non-profit institutions serving households (community organizations) 19656
19.1%
Business non-profit institutions 19656
19.1%

Length

2023-12-20T10:51:25.944782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:26.559267image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 102816
25.1%
institutions 102816
25.1%
total 42336
10.3%
excluding 21168
 
5.2%
governments 21168
 
5.2%
government 21168
 
5.2%
serving 19656
 
4.8%
households 19656
 
4.8%
community 19656
 
4.8%
organizations 19656
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3826872
87.4%
Space Separator 306936
 
7.0%
Dash Punctuation 102816
 
2.3%
Uppercase Letter 102816
 
2.3%
Open Punctuation 19656
 
0.4%
Close Punctuation 19656
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 595728
15.6%
t 535248
14.0%
i 530712
13.9%
o 491400
12.8%
s 364392
9.5%
r 184464
 
4.8%
u 182952
 
4.8%
e 164808
 
4.3%
f 102816
 
2.7%
p 102816
 
2.7%
Other values (11) 571536
14.9%
Uppercase Letter
ValueCountFrequency (%)
T 42336
41.2%
G 21168
20.6%
N 19656
19.1%
B 19656
19.1%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19656
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3929688
89.7%
Common 449064
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 595728
15.2%
t 535248
13.6%
i 530712
13.5%
o 491400
12.5%
s 364392
9.3%
r 184464
 
4.7%
u 182952
 
4.7%
e 164808
 
4.2%
f 102816
 
2.6%
p 102816
 
2.6%
Other values (15) 674352
17.2%
Common
ValueCountFrequency (%)
306936
68.4%
- 102816
 
22.9%
( 19656
 
4.4%
) 19656
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4378752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Characteristics
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Male employees
 
5880
Not a visible minority
 
5880
55 to 64 years
 
5880
Female employees
 
5880
High school diploma and less
 
5880
Other values (13)
73416 

Length

Max length33
Median length28
Mean length19.415033
Min length14

Characters and Unicode

Total characters1996176
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.7%
Not a visible minority 5880
 
5.7%
55 to 64 years 5880
 
5.7%
Female employees 5880
 
5.7%
High school diploma and less 5880
 
5.7%
College diploma 5880
 
5.7%
Visible minority 5880
 
5.7%
25 to 34 years 5880
 
5.7%
35 to 44 years 5880
 
5.7%
45 to 54 years 5880
 
5.7%
Other values (8) 44016
42.8%

Length

2023-12-20T10:51:27.310947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 34272
 
10.4%
employees 33936
 
10.2%
to 28896
 
8.7%
and 16800
 
5.1%
visible 11760
 
3.6%
minority 11760
 
3.6%
diploma 11760
 
3.6%
identity 11088
 
3.3%
male 5880
 
1.8%
college 5880
 
1.8%
Other values (28) 159096
48.0%

Most occurring characters

ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1561896
78.2%
Space Separator 228312
 
11.4%
Decimal Number 126336
 
6.3%
Uppercase Letter 68544
 
3.4%
Dash Punctuation 11088
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255024
16.3%
i 147840
9.5%
o 142800
9.1%
s 114240
 
7.3%
a 102648
 
6.6%
l 98112
 
6.3%
y 96600
 
6.2%
t 96432
 
6.2%
r 90216
 
5.8%
n 89544
 
5.7%
Other values (10) 328440
21.0%
Uppercase Letter
ValueCountFrequency (%)
N 16968
24.8%
I 11088
16.2%
M 5880
 
8.6%
V 5880
 
8.6%
C 5880
 
8.6%
H 5880
 
8.6%
F 5880
 
8.6%
U 5544
 
8.1%
T 5544
 
8.1%
Decimal Number
ValueCountFrequency (%)
5 46032
36.4%
4 40656
32.2%
3 11760
 
9.3%
2 11256
 
8.9%
6 11256
 
8.9%
1 5376
 
4.3%
Space Separator
ValueCountFrequency (%)
228312
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1630440
81.7%
Common 365736
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255024
15.6%
i 147840
 
9.1%
o 142800
 
8.8%
s 114240
 
7.0%
a 102648
 
6.3%
l 98112
 
6.0%
y 96600
 
5.9%
t 96432
 
5.9%
r 90216
 
5.5%
n 89544
 
5.5%
Other values (19) 396984
24.3%
Common
ValueCountFrequency (%)
228312
62.4%
5 46032
 
12.6%
4 40656
 
11.1%
3 11760
 
3.2%
2 11256
 
3.1%
6 11256
 
3.1%
- 11088
 
3.0%
1 5376
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1996176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Number of jobs
14688 
Hours worked
14688 
Wages and salaries
14688 
Average annual hours worked
14688 
Average weekly hours worked
14688 
Other values (2)
29376 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2203200
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 14688
14.3%
Hours worked 14688
14.3%
Wages and salaries 14688
14.3%
Average annual hours worked 14688
14.3%
Average weekly hours worked 14688
14.3%
Average annual wages and salaries 14688
14.3%
Average hourly wage 14688
14.3%

Length

2023-12-20T10:51:27.930947image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:28.485097image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 58752
16.7%
hours 44064
12.5%
worked 44064
12.5%
wages 29376
8.3%
and 29376
8.3%
salaries 29376
8.3%
annual 29376
8.3%
number 14688
 
4.2%
of 14688
 
4.2%
jobs 14688
 
4.2%
Other values (3) 44064
12.5%

Most occurring characters

ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1850688
84.0%
Space Separator 249696
 
11.3%
Uppercase Letter 102816
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279072
15.1%
a 249696
13.5%
r 205632
11.1%
s 146880
 
7.9%
o 132192
 
7.1%
u 102816
 
5.6%
g 102816
 
5.6%
n 88128
 
4.8%
l 88128
 
4.8%
w 88128
 
4.8%
Other values (10) 367200
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 58752
57.1%
W 14688
 
14.3%
H 14688
 
14.3%
N 14688
 
14.3%
Space Separator
ValueCountFrequency (%)
249696
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1953504
88.7%
Common 249696
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279072
14.3%
a 249696
12.8%
r 205632
10.5%
s 146880
 
7.5%
o 132192
 
6.8%
u 102816
 
5.3%
g 102816
 
5.3%
n 88128
 
4.5%
l 88128
 
4.5%
w 88128
 
4.5%
Other values (14) 470016
24.1%
Common
ValueCountFrequency (%)
249696
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2203200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Hours
44064 
Dollars
44064 
Jobs
14688 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters587520
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 44064
42.9%
Dollars 44064
42.9%
Jobs 14688
 
14.3%

Length

2023-12-20T10:51:29.140782image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:29.641622image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 44064
42.9%
dollars 44064
42.9%
jobs 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484704
82.5%
Uppercase Letter 102816
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 102816
21.2%
s 102816
21.2%
r 88128
18.2%
l 88128
18.2%
u 44064
9.1%
a 44064
9.1%
b 14688
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 44064
42.9%
D 44064
42.9%
J 14688
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 587520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 587520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
units
73440 
thousands
14688 
millions
14688 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters616896
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Length

2023-12-20T10:51:30.288281image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T10:51:30.796654image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616896
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 616896
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 616896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

VALUE
Real number (ℝ)

Distinct35398
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T10:51:31.366366image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T10:51:32.109016image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
2.0%
31 1949
 
1.9%
32 1844
 
1.8%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
1.0%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
87.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.2%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T10:51:19.826054image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:51:18.794047image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:51:20.373762image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T10:51:19.261301image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T10:51:32.710463image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0290.0230.0000.0000.000
GEO0.0000.0901.0001.0000.0290.0230.0000.0000.000
Sector0.0000.0510.0290.0291.0000.0180.0000.0000.000
Characteristics0.0000.0440.0230.0230.0181.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T10:51:21.200717image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T10:51:22.100559image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63
Pandas Profiling Report with Columned Sorted and NA Removed

Overview

Dataset statistics

Number of variables9
Number of observations102816
Missing cells0
Missing cells (%)0.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory7.8 MiB
Average record size in memory80.0 B

Variable types

Numeric2
Categorical7

Alerts

DGUID is highly overall correlated with GEOHigh correlation
GEO is highly overall correlated with DGUIDHigh correlation
Indicators is highly overall correlated with UOM and 1 other fieldsHigh correlation
UOM is highly overall correlated with IndicatorsHigh correlation
SCALAR_FACTOR is highly overall correlated with IndicatorsHigh correlation
Indicators is uniformly distributedUniform

Reproduction

Analysis started2023-12-20 17:52:27.645441
Analysis finished2023-12-20 17:52:32.949334
Duration5.3 seconds
Software versionydata-profiling vv4.5.1
Download configurationconfig.json

Variables

REF_DATE
Real number (ℝ)

Distinct12
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2015.5
Minimum2010
Maximum2021
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T12:52:33.069586image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum2010
5-th percentile2010
Q12012.75
median2015.5
Q32018.25
95-th percentile2021
Maximum2021
Range11
Interquartile range (IQR)5.5

Descriptive statistics

Standard deviation3.4520693
Coefficient of variation (CV)0.0017127608
Kurtosis-1.216784
Mean2015.5
Median Absolute Deviation (MAD)3
Skewness0
Sum2.0722565 × 108
Variance11.916783
MonotonicityIncreasing
2023-12-20T12:52:33.291551image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=12)
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
Other values (2) 17136
16.7%
ValueCountFrequency (%)
2010 8568
8.3%
2011 8568
8.3%
2012 8568
8.3%
2013 8568
8.3%
2014 8568
8.3%
2015 8568
8.3%
2016 8568
8.3%
2017 8568
8.3%
2018 8568
8.3%
2019 8568
8.3%
ValueCountFrequency (%)
2021 8568
8.3%
2020 8568
8.3%
2019 8568
8.3%
2018 8568
8.3%
2017 8568
8.3%
2016 8568
8.3%
2015 8568
8.3%
2014 8568
8.3%
2013 8568
8.3%
2012 8568
8.3%

DGUID
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
2016A000011124
7560 
2016A000212
7560 
2016A000213
7560 
2016A000224
7560 
2016A000235
7560 
Other values (9)
65016 

Length

Max length14
Median length11
Mean length11.220588
Min length11

Characters and Unicode

Total characters1153656
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row2016A000011124
2nd row2016A000011124
3rd row2016A000011124
4th row2016A000011124
5th row2016A000011124

Common Values

ValueCountFrequency (%)
2016A000011124 7560
 
7.4%
2016A000212 7560
 
7.4%
2016A000213 7560
 
7.4%
2016A000224 7560
 
7.4%
2016A000235 7560
 
7.4%
2016A000246 7560
 
7.4%
2016A000247 7560
 
7.4%
2016A000248 7560
 
7.4%
2016A000259 7560
 
7.4%
2016A000210 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T12:52:33.528700image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
2016a000011124 7560
 
7.4%
2016a000212 7560
 
7.4%
2016a000213 7560
 
7.4%
2016a000224 7560
 
7.4%
2016a000235 7560
 
7.4%
2016a000246 7560
 
7.4%
2016a000247 7560
 
7.4%
2016a000248 7560
 
7.4%
2016a000259 7560
 
7.4%
2016a000210 7224
 
7.0%
Other values (4) 27552
26.8%

Most occurring characters

ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

Most occurring categories

ValueCountFrequency (%)
Decimal Number 1050840
91.1%
Uppercase Letter 102816
 
8.9%

Most frequent character per category

Decimal Number
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Uppercase Letter
ValueCountFrequency (%)
A 102816
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common 1050840
91.1%
Latin 102816
 
8.9%

Most frequent character per script

Common
ValueCountFrequency (%)
0 432600
41.2%
2 227304
21.6%
1 169512
 
16.1%
6 130704
 
12.4%
4 37800
 
3.6%
3 15120
 
1.4%
5 15120
 
1.4%
7 7560
 
0.7%
8 7560
 
0.7%
9 7560
 
0.7%
Latin
ValueCountFrequency (%)
A 102816
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1153656
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
0 432600
37.5%
2 227304
19.7%
1 169512
 
14.7%
6 130704
 
11.3%
A 102816
 
8.9%
4 37800
 
3.3%
3 15120
 
1.3%
5 15120
 
1.3%
7 7560
 
0.7%
8 7560
 
0.7%

GEO
Categorical

HIGH CORRELATION 

Distinct14
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Canada
7560 
Nova Scotia
7560 
New Brunswick
7560 
Quebec
7560 
Ontario
7560 
Other values (9)
65016 

Length

Max length25
Median length16
Mean length11.72549
Min length5

Characters and Unicode

Total characters1205568
Distinct characters34
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowCanada
2nd rowCanada
3rd rowCanada
4th rowCanada
5th rowCanada

Common Values

ValueCountFrequency (%)
Canada 7560
 
7.4%
Nova Scotia 7560
 
7.4%
New Brunswick 7560
 
7.4%
Quebec 7560
 
7.4%
Ontario 7560
 
7.4%
Manitoba 7560
 
7.4%
Saskatchewan 7560
 
7.4%
Alberta 7560
 
7.4%
British Columbia 7560
 
7.4%
Newfoundland and Labrador 7224
 
7.0%
Other values (4) 27552
26.8%

Length

2023-12-20T12:52:33.797104image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
canada 7560
 
4.7%
manitoba 7560
 
4.7%
nova 7560
 
4.7%
british 7560
 
4.7%
alberta 7560
 
4.7%
saskatchewan 7560
 
4.7%
columbia 7560
 
4.7%
ontario 7560
 
4.7%
quebec 7560
 
4.7%
brunswick 7560
 
4.7%
Other values (12) 86016
53.2%

Most occurring characters

ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 992376
82.3%
Uppercase Letter 154392
 
12.8%
Space Separator 58800
 
4.9%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
a 148176
14.9%
r 88032
 
8.9%
n 87024
 
8.8%
i 74592
 
7.5%
e 73920
 
7.4%
t 73584
 
7.4%
o 73248
 
7.4%
d 58128
 
5.9%
u 49560
 
5.0%
w 44352
 
4.5%
Other values (9) 221760
22.3%
Uppercase Letter
ValueCountFrequency (%)
N 36120
23.4%
C 15120
9.8%
B 15120
9.8%
S 15120
9.8%
Q 7560
 
4.9%
O 7560
 
4.9%
M 7560
 
4.9%
A 7560
 
4.9%
L 7224
 
4.7%
P 7224
 
4.7%
Other values (4) 28224
18.3%
Space Separator
ValueCountFrequency (%)
58800
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1146768
95.1%
Common 58800
 
4.9%

Most frequent character per script

Latin
ValueCountFrequency (%)
a 148176
 
12.9%
r 88032
 
7.7%
n 87024
 
7.6%
i 74592
 
6.5%
e 73920
 
6.4%
t 73584
 
6.4%
o 73248
 
6.4%
d 58128
 
5.1%
u 49560
 
4.3%
w 44352
 
3.9%
Other values (23) 376152
32.8%
Common
ValueCountFrequency (%)
58800
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1205568
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
a 148176
 
12.3%
r 88032
 
7.3%
n 87024
 
7.2%
i 74592
 
6.2%
e 73920
 
6.1%
t 73584
 
6.1%
o 73248
 
6.1%
58800
 
4.9%
d 58128
 
4.8%
u 49560
 
4.1%
Other values (24) 420504
34.9%

Sector
Categorical

Distinct5
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Total non-profit institutions
21168 
Total non-profit institutions excluding governments
21168 
Government non-profit institutions
21168 
Non-profit institutions serving households (community organizations)
19656 
Business non-profit institutions
19656 

Length

Max length68
Median length34
Mean length42.588235
Min length29

Characters and Unicode

Total characters4378752
Distinct characters29
Distinct categories6 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowTotal non-profit institutions
2nd rowTotal non-profit institutions
3rd rowTotal non-profit institutions
4th rowTotal non-profit institutions
5th rowTotal non-profit institutions

Common Values

ValueCountFrequency (%)
Total non-profit institutions 21168
20.6%
Total non-profit institutions excluding governments 21168
20.6%
Government non-profit institutions 21168
20.6%
Non-profit institutions serving households (community organizations) 19656
19.1%
Business non-profit institutions 19656
19.1%

Length

2023-12-20T12:52:34.085018image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:34.301313image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
non-profit 102816
25.1%
institutions 102816
25.1%
total 42336
10.3%
excluding 21168
 
5.2%
governments 21168
 
5.2%
government 21168
 
5.2%
serving 19656
 
4.8%
households 19656
 
4.8%
community 19656
 
4.8%
organizations 19656
 
4.8%

Most occurring characters

ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 3826872
87.4%
Space Separator 306936
 
7.0%
Dash Punctuation 102816
 
2.3%
Uppercase Letter 102816
 
2.3%
Open Punctuation 19656
 
0.4%
Close Punctuation 19656
 
0.4%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
n 595728
15.6%
t 535248
14.0%
i 530712
13.9%
o 491400
12.8%
s 364392
9.5%
r 184464
 
4.8%
u 182952
 
4.8%
e 164808
 
4.3%
f 102816
 
2.7%
p 102816
 
2.7%
Other values (11) 571536
14.9%
Uppercase Letter
ValueCountFrequency (%)
T 42336
41.2%
G 21168
20.6%
N 19656
19.1%
B 19656
19.1%
Space Separator
ValueCountFrequency (%)
306936
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 102816
100.0%
Open Punctuation
ValueCountFrequency (%)
( 19656
100.0%
Close Punctuation
ValueCountFrequency (%)
) 19656
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 3929688
89.7%
Common 449064
 
10.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
n 595728
15.2%
t 535248
13.6%
i 530712
13.5%
o 491400
12.5%
s 364392
9.3%
r 184464
 
4.7%
u 182952
 
4.7%
e 164808
 
4.2%
f 102816
 
2.6%
p 102816
 
2.6%
Other values (15) 674352
17.2%
Common
ValueCountFrequency (%)
306936
68.4%
- 102816
 
22.9%
( 19656
 
4.4%
) 19656
 
4.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 4378752
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
n 595728
13.6%
t 535248
12.2%
i 530712
12.1%
o 491400
11.2%
s 364392
 
8.3%
306936
 
7.0%
r 184464
 
4.2%
u 182952
 
4.2%
e 164808
 
3.8%
f 102816
 
2.3%
Other values (19) 919296
21.0%

Characteristics
Categorical

Distinct18
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Male employees
 
5880
Not a visible minority
 
5880
55 to 64 years
 
5880
Female employees
 
5880
High school diploma and less
 
5880
Other values (13)
73416 

Length

Max length33
Median length28
Mean length19.415033
Min length14

Characters and Unicode

Total characters1996176
Distinct characters37
Distinct categories5 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMale employees
2nd rowMale employees
3rd rowMale employees
4th rowMale employees
5th rowMale employees

Common Values

ValueCountFrequency (%)
Male employees 5880
 
5.7%
Not a visible minority 5880
 
5.7%
55 to 64 years 5880
 
5.7%
Female employees 5880
 
5.7%
High school diploma and less 5880
 
5.7%
College diploma 5880
 
5.7%
Visible minority 5880
 
5.7%
25 to 34 years 5880
 
5.7%
35 to 44 years 5880
 
5.7%
45 to 54 years 5880
 
5.7%
Other values (8) 44016
42.8%

Length

2023-12-20T12:52:34.620579image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category
ValueCountFrequency (%)
years 34272
 
10.4%
employees 33936
 
10.2%
to 28896
 
8.7%
and 16800
 
5.1%
visible 11760
 
3.6%
minority 11760
 
3.6%
diploma 11760
 
3.6%
identity 11088
 
3.3%
male 5880
 
1.8%
college 5880
 
1.8%
Other values (28) 159096
48.0%

Most occurring characters

ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1561896
78.2%
Space Separator 228312
 
11.4%
Decimal Number 126336
 
6.3%
Uppercase Letter 68544
 
3.4%
Dash Punctuation 11088
 
0.6%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 255024
16.3%
i 147840
9.5%
o 142800
9.1%
s 114240
 
7.3%
a 102648
 
6.6%
l 98112
 
6.3%
y 96600
 
6.2%
t 96432
 
6.2%
r 90216
 
5.8%
n 89544
 
5.7%
Other values (10) 328440
21.0%
Uppercase Letter
ValueCountFrequency (%)
N 16968
24.8%
I 11088
16.2%
M 5880
 
8.6%
V 5880
 
8.6%
C 5880
 
8.6%
H 5880
 
8.6%
F 5880
 
8.6%
U 5544
 
8.1%
T 5544
 
8.1%
Decimal Number
ValueCountFrequency (%)
5 46032
36.4%
4 40656
32.2%
3 11760
 
9.3%
2 11256
 
8.9%
6 11256
 
8.9%
1 5376
 
4.3%
Space Separator
ValueCountFrequency (%)
228312
100.0%
Dash Punctuation
ValueCountFrequency (%)
- 11088
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1630440
81.7%
Common 365736
 
18.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 255024
15.6%
i 147840
 
9.1%
o 142800
 
8.8%
s 114240
 
7.0%
a 102648
 
6.3%
l 98112
 
6.0%
y 96600
 
5.9%
t 96432
 
5.9%
r 90216
 
5.5%
n 89544
 
5.5%
Other values (19) 396984
24.3%
Common
ValueCountFrequency (%)
228312
62.4%
5 46032
 
12.6%
4 40656
 
11.1%
3 11760
 
3.2%
2 11256
 
3.1%
6 11256
 
3.1%
- 11088
 
3.0%
1 5376
 
1.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 1996176
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 255024
12.8%
228312
 
11.4%
i 147840
 
7.4%
o 142800
 
7.2%
s 114240
 
5.7%
a 102648
 
5.1%
l 98112
 
4.9%
y 96600
 
4.8%
t 96432
 
4.8%
r 90216
 
4.5%
Other values (27) 623952
31.3%

Indicators
Categorical

HIGH CORRELATION  UNIFORM 

Distinct7
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Number of jobs
14688 
Hours worked
14688 
Wages and salaries
14688 
Average annual hours worked
14688 
Average weekly hours worked
14688 
Other values (2)
29376 

Length

Max length33
Median length19
Mean length21.428571
Min length12

Characters and Unicode

Total characters2203200
Distinct characters25
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowNumber of jobs
2nd rowHours worked
3rd rowWages and salaries
4th rowAverage annual hours worked
5th rowAverage weekly hours worked

Common Values

ValueCountFrequency (%)
Number of jobs 14688
14.3%
Hours worked 14688
14.3%
Wages and salaries 14688
14.3%
Average annual hours worked 14688
14.3%
Average weekly hours worked 14688
14.3%
Average annual wages and salaries 14688
14.3%
Average hourly wage 14688
14.3%

Length

2023-12-20T12:52:34.820154image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:35.041002image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
average 58752
16.7%
hours 44064
12.5%
worked 44064
12.5%
wages 29376
8.3%
and 29376
8.3%
salaries 29376
8.3%
annual 29376
8.3%
number 14688
 
4.2%
of 14688
 
4.2%
jobs 14688
 
4.2%
Other values (3) 44064
12.5%

Most occurring characters

ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 1850688
84.0%
Space Separator 249696
 
11.3%
Uppercase Letter 102816
 
4.7%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
e 279072
15.1%
a 249696
13.5%
r 205632
11.1%
s 146880
 
7.9%
o 132192
 
7.1%
u 102816
 
5.6%
g 102816
 
5.6%
n 88128
 
4.8%
l 88128
 
4.8%
w 88128
 
4.8%
Other values (10) 367200
19.8%
Uppercase Letter
ValueCountFrequency (%)
A 58752
57.1%
W 14688
 
14.3%
H 14688
 
14.3%
N 14688
 
14.3%
Space Separator
ValueCountFrequency (%)
249696
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin 1953504
88.7%
Common 249696
 
11.3%

Most frequent character per script

Latin
ValueCountFrequency (%)
e 279072
14.3%
a 249696
12.8%
r 205632
10.5%
s 146880
 
7.5%
o 132192
 
6.8%
u 102816
 
5.3%
g 102816
 
5.3%
n 88128
 
4.5%
l 88128
 
4.5%
w 88128
 
4.5%
Other values (14) 470016
24.1%
Common
ValueCountFrequency (%)
249696
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII 2203200
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
e 279072
12.7%
a 249696
11.3%
249696
11.3%
r 205632
 
9.3%
s 146880
 
6.7%
o 132192
 
6.0%
u 102816
 
4.7%
g 102816
 
4.7%
n 88128
 
4.0%
l 88128
 
4.0%
Other values (15) 558144
25.3%

UOM
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
Hours
44064 
Dollars
44064 
Jobs
14688 

Length

Max length7
Median length5
Mean length5.7142857
Min length4

Characters and Unicode

Total characters587520
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowJobs
2nd rowHours
3rd rowDollars
4th rowHours
5th rowHours

Common Values

ValueCountFrequency (%)
Hours 44064
42.9%
Dollars 44064
42.9%
Jobs 14688
 
14.3%

Length

2023-12-20T12:52:35.334473image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:35.523355image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
hours 44064
42.9%
dollars 44064
42.9%
jobs 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 484704
82.5%
Uppercase Letter 102816
 
17.5%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
o 102816
21.2%
s 102816
21.2%
r 88128
18.2%
l 88128
18.2%
u 44064
9.1%
a 44064
9.1%
b 14688
 
3.0%
Uppercase Letter
ValueCountFrequency (%)
H 44064
42.9%
D 44064
42.9%
J 14688
 
14.3%

Most occurring scripts

ValueCountFrequency (%)
Latin 587520
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

Most occurring blocks

ValueCountFrequency (%)
ASCII 587520
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
o 102816
17.5%
s 102816
17.5%
r 88128
15.0%
l 88128
15.0%
H 44064
7.5%
u 44064
7.5%
D 44064
7.5%
a 44064
7.5%
J 14688
 
2.5%
b 14688
 
2.5%

SCALAR_FACTOR
Categorical

HIGH CORRELATION 

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size1.6 MiB
units
73440 
thousands
14688 
millions
14688 

Length

Max length9
Median length5
Mean length6
Min length5

Characters and Unicode

Total characters616896
Distinct characters11
Distinct categories1 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowunits
2nd rowthousands
3rd rowmillions
4th rowunits
5th rowunits

Common Values

ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Length

2023-12-20T12:52:35.792278image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram of lengths of the category

Common Values (Plot)

2023-12-20T12:52:35.997469image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
ValueCountFrequency (%)
units 73440
71.4%
thousands 14688
 
14.3%
millions 14688
 
14.3%

Most occurring characters

ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter 616896
100.0%

Most frequent character per category

Lowercase Letter
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring scripts

ValueCountFrequency (%)
Latin 616896
100.0%

Most frequent character per script

Latin
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

Most occurring blocks

ValueCountFrequency (%)
ASCII 616896
100.0%

Most frequent character per block

ASCII
ValueCountFrequency (%)
s 117504
19.0%
n 102816
16.7%
i 102816
16.7%
u 88128
14.3%
t 88128
14.3%
o 29376
 
4.8%
l 29376
 
4.8%
h 14688
 
2.4%
a 14688
 
2.4%
d 14688
 
2.4%

VALUE
Real number (ℝ)

Distinct35398
Distinct (%)34.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean26419.543
Minimum0
Maximum3857813
Zeros9
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size1.6 MiB
2023-12-20T12:52:36.249774image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile18.6075
Q132
median1386
Q317660
95-th percentile85576.5
Maximum3857813
Range3857813
Interquartile range (IQR)17628

Descriptive statistics

Standard deviation118041.35
Coefficient of variation (CV)4.4679558
Kurtosis266.93641
Mean26419.543
Median Absolute Deviation (MAD)1358
Skewness13.831207
Sum2.7163518 × 109
Variance1.393376 × 1010
MonotonicityNot monotonic
2023-12-20T12:52:36.554120image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
30 2040
 
2.0%
31 1949
 
1.9%
32 1844
 
1.8%
29 1562
 
1.5%
33 1476
 
1.4%
34 1005
 
1.0%
28 905
 
0.9%
35 638
 
0.6%
27 523
 
0.5%
36 457
 
0.4%
Other values (35388) 90417
87.9%
ValueCountFrequency (%)
0 9
 
< 0.1%
1 219
0.2%
2 299
0.3%
3 201
0.2%
4 203
0.2%
5 157
0.2%
6 147
0.1%
7 135
0.1%
8 179
0.2%
9 145
0.1%
ValueCountFrequency (%)
3857813 1
< 0.1%
3690238 1
< 0.1%
3643952 1
< 0.1%
3581317 1
< 0.1%
3556730 1
< 0.1%
3544822 1
< 0.1%
3487265 1
< 0.1%
3425218 1
< 0.1%
3402343 1
< 0.1%
3372266 1
< 0.1%

Interactions

2023-12-20T12:52:31.585167image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:31.163857image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:31.821014image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
2023-12-20T12:52:31.350949image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/

Correlations

2023-12-20T12:52:36.739228image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
REF_DATEVALUEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTOR
REF_DATE1.0000.0270.0000.0000.0000.0000.0000.0000.000
VALUE0.0271.0000.0900.0900.0510.0440.0750.0810.105
DGUID0.0000.0901.0001.0000.0290.0230.0000.0000.000
GEO0.0000.0901.0001.0000.0290.0230.0000.0000.000
Sector0.0000.0510.0290.0291.0000.0180.0000.0000.000
Characteristics0.0000.0440.0230.0230.0181.0000.0000.0000.000
Indicators0.0000.0750.0000.0000.0000.0001.0001.0001.000
UOM0.0000.0810.0000.0000.0000.0001.0001.0000.447
SCALAR_FACTOR0.0000.1050.0000.0000.0000.0001.0000.4471.000

Missing values

2023-12-20T12:52:32.199953image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
A simple visualization of nullity by column.
2023-12-20T12:52:32.650110image/svg+xmlMatplotlib v3.7.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.

Sample

REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
020102016A000011124CanadaTotal non-profit institutionsMale employeesNumber of jobsJobsunits642584.00
120102016A000011124CanadaTotal non-profit institutionsMale employeesHours workedHoursthousands1048516.00
220102016A000011124CanadaTotal non-profit institutionsMale employeesWages and salariesDollarsmillions30805.00
320102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual hours workedHoursunits1632.00
420102016A000011124CanadaTotal non-profit institutionsMale employeesAverage weekly hours workedHoursunits31.00
520102016A000011124CanadaTotal non-profit institutionsMale employeesAverage annual wages and salariesDollarsunits47940.00
620102016A000011124CanadaTotal non-profit institutionsMale employeesAverage hourly wageDollarsunits29.38
720102016A000011124CanadaTotal non-profit institutionsFemale employeesNumber of jobsJobsunits1500394.00
820102016A000011124CanadaTotal non-profit institutionsFemale employeesHours workedHoursthousands2331018.00
920102016A000011124CanadaTotal non-profit institutionsFemale employeesWages and salariesDollarsmillions60943.00
REF_DATEDGUIDGEOSectorCharacteristicsIndicatorsUOMSCALAR_FACTORVALUE
10583020212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage weekly hours workedHoursunits33.00
10583120212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage annual wages and salariesDollarsunits101380.00
10583220212016A000262NunavutGovernment non-profit institutions55 to 64 yearsAverage hourly wageDollarsunits59.98
10583320212016A000262NunavutGovernment non-profit institutions65 years old and overNumber of jobsJobsunits27.00
10583420212016A000262NunavutGovernment non-profit institutions65 years old and overHours workedHoursthousands30.00
10583520212016A000262NunavutGovernment non-profit institutions65 years old and overWages and salariesDollarsmillions2.00
10583620212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual hours workedHoursunits1111.00
10583720212016A000262NunavutGovernment non-profit institutions65 years old and overAverage weekly hours workedHoursunits21.00
10583820212016A000262NunavutGovernment non-profit institutions65 years old and overAverage annual wages and salariesDollarsunits74037.00
10583920212016A000262NunavutGovernment non-profit institutions65 years old and overAverage hourly wageDollarsunits66.63